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^NTRA-FRA^^E QUANTIZER SELECTION FOR VIDEO COMPRESSION 



BACKGROUND OF THE INVENTION 

The present invention relates to image processing, and, in particular, to video comptession. 

Cross-Reference to Related A pplications 

This application claims the ijenefit of the filing date of U.S. provisional application no. 
60/100,939, filed on 09/18/98 as attorney docket no. SAR 12728PROV. 

Description of the Related Art 

The goal of video compression processing is to encode image data to reduce tlie number of bits 
used to represent a sequence of video images while maintaining an acceptable level of quality in ttie 
decoded video sequence. This goal is particularly important in certain applications, such as 
videophone or video conferencing over POTS (plain old telephone sei-vice) or ISDN (integrated 
services digital network) lines, where the existence of limited transmission bandwidth requires careful 
control over the bit rate, that is, the number of bits used to encode each image in the video sequence. 
Furthermore, in order to satisfy the transmission and other processing requirements of a video 
conferencing system, it is often desirable to have a relatively steady flow of bits in the encoded video 
bitstreain. 

Achieving a relatively uniform bit rate can be very difficult, especially for video compression 
algorithms that encode different images within a video sequence using different compression 
techniques. Depending on the video compression algorithm, images may be designated as the 
following different types of frames for compression processing: 

0 An intra (I) frame which is encoded using only intra-frame compression techniques, 
o A predicted (F) frame which is encoded using inter-frame compression techniques based on a 
previous I or P frame, and which can itself be used as a reference frame to encode one or more other 
frames, 

o A bi-diimional (B) frame which is encoded using bi-directional inter-frame compression 
techniques based on a previous lorP frame and a subsequent lorP frame, and which cannot be used 
to encode another frame, and 

o A PB frame which corresponds to two images — a P frame and a B frame in between the P 
frame and the previous I/P frame ~ that are encoded as a single frame (as in the H.263 video 
compression algorithm). 
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Depending on the actual image, data to be encoded, tJiese different types of frames typically require 
different rtwmber of bits to encode, For example, I frames typicai'y require tiie greatest numbers of , 
bits. Willie B liames lib .c-qa')Mh. ,inn.cr tK^ 

In atypical iransfonti-t'a-.OL' ^ i. ..^riV ■itsN^^'i ,!ijor;U v:, t "i! Kk based iransfonr. sutii as a 
5 discrete cosine transform fDCTj. is applied to biockii of image ctara corresponding cither to pixel 
values or pixel differencej genesateJ, fi ■ e^ anf !c, i ■u. , r "i. >i! .-oaipen aied mter-srame 
di^'ferencing scheme The rcii'iii.g tutii^i Th (.weT.a-ta - iar < iv i rlctk in Jkn cuamized for 
subsequent encoding (e.g., nsn-iengtii encoding toUov/ed by vanable-ienati) encoding). The degree to 
which tine transform coetrlcients are quantized oirectly affects boili the number ot bits used to represent 

10 the image data and the quaiity of the resulting decoded image. Th)s degree of quantization is also 
referi-ed to as tire quantization level, which is ofien representee by a specified qasmnzer value that is 
used to qiianfi^t ^ tiansfotm cnLrT:(.fcnts. In genetai, higlicr quanscaiiori ic\.i,s imply fewer bits and 
lower quaiiiy. As such, the quantizer is often used as the primary variable for controlling the tradeoff 
between bit rare and image quality. 

1.5 Visual quality ot video depends not onlv on global measures (Id'e pixel signal So noise ratio 

(PSNR'il, but also on how the error is distnbuieci m space and time. Thus, it is important to mamtani 
smoothness of the quantizer ( whicn is closeiv related to the local dssiortion; acrty.ss the p>c!ure. iri tact, 
m many scenes, the ideal quantizer seleciion a tin-ionn value across ihe scere riowe^iv. sucl; » 
scheme will not support the moving of bits to a region of interest from less-imporl:ant regions, and 

20 fnrthennore, will provide very little control over the bits used to encode the picture. Thus, it cannot be 
used in constant (or near-constant) bit-rate applications (like videophone and video-conferencing over 
POTS or ISDN). 

The other possibility is to vary the quantizer from macrob!ock-to-macroblock within the 
constraints of the coding standard being used (for example, in H.263, the quantizer level can change by 

25 a value of at most 2 in either direction). Examples of such schemes are given in the H.263+ TMN8 
(Test Model Near-Tenn 8) and TM.N9 documents (see, e.g., ITU - Telecommunications 
Standardization Sector, "Video Codec Test Model, Near-Terra, Version 9 (TMN9)", Document QI5- 
C-15, December 1997). In these schemes, while the frame-level bit target can be accurately met, there 
are many, possibly large quantizer changes, both spatially and temporaliy, which show up annoyingly 

30 in the moving video as undesirable artifacts. 

As described in the previous section, some video compression algorithms, such as H.263, allow 
tiie quantizers to vary from macroblock to inacroblock within a frame, although such algorithms often 
35 iiniit tiie magnitude of change in quantization level between horizontally adjacent macroblocks (e.g., a 
maximum change of +./-2 levels). In an application with limited bandwidth, this ability to vary the 
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^ li^oLi jiK'! JlvJ \ ti. n d irjn e enable*, she video v,ompret.MOJi . Oi^csNi t ^ kI'-^^ Aki. ... In 

no!Pbi!Mie Sower !ji!if!fi/ Hit n b\rl s m as ol i^tpsest ikO r-i! ej^ample m the 
V ssi vKCi )hi ne <•»'- s!U("ocon can ^ < < rs s < n i ^ i\ i 

Itvi.!- r^^urou'id .^01 tk ' !< ^ ^ . i . ' MOstN u tat 

requ!remei)ts wniie optniiizins; video qualuy, 

li=prp<;eii jTvCVtni d rc ^ ■.^ -> " t 7 }<■ i >i \i< n s. i^oAni 

nt ^s-^ond ng to a ref ion it m i.rcs cs-^t . *^ i u ni is ^ i-^it,!: data 

corresponding to a transision region jn trie srnage located bebA-eeii the region or interest and a Issst- 
itaril US'' im i " ^ n^ 1 iirst qini ! t 1 leM^l f )! e h et >f !nt.ge ^ata n it 

n of interest ^dist ~.tn t sf.. luJ uiw tic e! lor cJi Sv, of ina^e data n th-^ tiarsition 
If-; (e) selecdng a third quantization level for each set of image data in the ieasl-iinpostant region; 
and (0 encoding the iraage based 0!i the selected first, second, and third qutintization levels. 

BRffiF DESCRIPTION OF THE DRAWINGS 
Other aspects, features, and advaiUages of the present invention will become more fully 
apparent from the following detailed description, the appended claims, and the accompanying drawings 
in which: 

Fig. ] shows a example of a typical image that can be encoded using the present invention; and 
Fig. 2 shows a flow diagram of the image processing implemented according to one 
embodiment of the present invention for an image, such as the image of Fig. 1 . 

DETAILED DESCRIPTION 
Fig. 1 shows a example of a typical image 100 that can be encoded using the present invention. 
The image in Fig. 1 consists of the head and shoulders of a person 102 positioned in front of 
background imagery 104, where the iraage data corresponding to the head of person 102 varies more in 
time (i.e., from frame to frame) tlian tlie background imagery. Such a scene is typical of videophone 
and video-conferencing applications. In general, during playback, the person in the foreground is of 
greater importance to the viewer of image 100 than the background imagery. According to the present 
invention, when bit rate is limited, image 100 is encoded such that, during playback, the video quality 
of the more-important foreground imagery' is greater than the video quality of the less important 
background imagery. This variation in playback video quality within an image is achieved by allowing 
the quantizers used during the video compression processing to encode the macrobiocks of iraage 100 
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to vary within the image. According to the present invention, the selection of quantize!^ follows a 
particular scheme, described as follows. 

\s -1 in Fig 1 !ni%e iOOi^ divided into thru, dsf tri.rt -'oi ik ! i n 106 

I r ! d E !s a 1 I >t f ntt ( s) (.ROI)) consists] j o) U m. i ih ) k i sp t i i n 
.5 !£ i] f sfi2 jjtk 1 u io If ic 108<di >ieEeiT(.jij !<;th i ; »)) n i. i ss n 
i t ((.ksuj ui It ! )d Kjjousjd If idj5^ } 104 (ukIu it! o - j lOI^i r^^ 

i)s) nj'-^E nlU) on^jsti! g f niac!obiock)> ocatcd be vii stthi i i k c li ndtl 
ba^k.,tOLiJ r tioj ^ tor in I) ht )r s ri ii vtntt )n ii )t ti n i i. U j k i i ^ 
LiCi^ro mc3 re^ici i06 iic uiodcd i sine th» simc qu?m!/ r Ql „ t! f tie i i bk k 
10 J! e s} 1 I! g t die i~tv,k r und rcgt. 1 108 ar encoded usmg t ^ st ne qu mti ei \^PJ v. 1 le 
rnaciC)l-j[ocKS corresponding to the uansition region 110 are encoded using the same Quantizer QPl. 
wheiv. tvoicaiiv. OPO >= OP! >=QP2. 

his. L .shows a ilow diagram of tne image Drocessmg sm.plemented accorcisnp to one 
'-III ii j a! 1 crfif^nto rnaot ' t lit*) f f ^> s en on 

15 ! \ ici! s iimkn nttJ i \ i Jeoproc ssorlhaf lf n r i i ii i )i 1 si i^ i ( pssing 
rcitii u 1 i lotijf (. tin tK ) motijn ouj, \ ai i n\ lui iin i ntin tr nsOrn 
apphcation, quantization, ixm-lensth encodma, and van able-length encotiinii, as part oi us t!venill 
1 ij M I It 1 n N ' ! s r bi% ' win ii be*^!! vvihtne 

s 1 1 ( nbei of bits to be used to encode the 

20 ptesei t na ms d c m uit il hit ! rl n r p 2021 

A.ftcrs lcan»abtt r th mat i i i | 204 io id nfif h c n ci bixkb 

torrt pondn^ c m r nore r ^ tjs<.l inter u r i )i 1 1 r i n 106 cist ^-loriii f to tl " heid 
nfpef n JSf2 in It! igt 100 t Ficr ) i Tb na t rtfenei! i s nem lEion m iv i 

which 1 purposes o th pie tit 1 1 etitun c i' i\ \i JLi!i,an> n bk s heir inrJuding 
25 lutomatK sch ties oi itiiv ictivc ko c m v>h i tl > i tis i.f i ittte t ir e h itl identified by 
tneuseu g ipirici mt in t o ii erenc io ted eiti t Eh encoduurti dc dtr 

After Edentifying the regions ot interest, tne inacroblocKs coixespondtne to one or tnore 
tiar itionre ion aethenilenti i c i 2061 h one mb dm it im lobl isdctnc ashsn 
pirf of linn It! n ttfionif tbf>r i omt 1 a t itsiumnrio) klnli i^xXot e^ion if 
30 mtL est idcn 1 led m St p 2(}4 i her si f ftie tn icrebk in ih sm t, a linrf is cu^ pat e 
the teast- tniporlatit background res;ion. 

[nth onipeoj n agv I'Mt rii thi ts^n i i \ i06 ii. 
r p !o 1 ti 1 UO n ti t k ii i^ii8 \ t is ti on 

! d )u M c^.^s^i > EX hiicd by a Single coiitmuous St t oi nsacfobJocks Tt ic s t h 
35 background region. Depending on tlie particular image and the particular application, an image may 
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have two or more different regions of interest and two or more different corresponding transitions 
regions. 

After idi-Miliiyifif: iho jiiiicrobloL-k'- vonc-spondnig to ihe differed regions, an ininai quantization 
level is selected for each region (step 208). According to the present invention, the macroblocks of 
5 each region are encoded using a single umibj-m quantizarion level, vvhes-e the quantization level may be 
difterent for die different regions. In general, when there are two or rnore different Eegions of interest 
and/or two or more different transitioti re,gions, the quantization level chif ers between different regions 
of interest and/or between different transition regions, as long as the quantization level is constant 
within each particular region. For example, a first region of ituerest may be more important than a 

10 second region of interest, in that case, it might be desirable to assign a lower quantizer to the first 
region of interest than io the second I'egion of interest., hi &ny case, t-acii regiori of interest will still 
have a con-esponding transition region that is encoded using its own, possibly different quannzer. 

hi one linpiemetitaEion, the inilial quanti/anon li-vols arc velectCi.i based on !:i!onjj.3iio:i related 
to the previously encoded image in she video seqitence. This initial selection of quantizers rnay be 

15 based on the previous frame's actual quantizer assignments atid bit cs penditurc, as well as on 

comparison of the curren; bit target and the cutrent rnovion-corfipensated distortion with those of the 
previous frame. For example, if the previous frame's bit expenditure was higher than the previous bit 
target i); it' the cuwciU bit t;;!;.',;! i>- ii)>.\er ihaii the previous bif target or if the cutren! distortion is higher 
that! t!;c previous distoiiion, then the previous quantizer assignments may need to be increased for the 

20 initial sdecsion for the current frame. 

In order to avoid abrupt variations in quasily between regions it is desirable that the quantizer 
used to encode a tt'ansuion region be fairly close to the qiianiizcTS used u? encode both the 
corresponding foreground region of interest and the ieast-importaiii background region, hi some video 
compression aigorithms, such as those conforming to the H.2(>3 framework, the difference between 

25 horizontally adiacent quantizers is already constrained (e.g., never more than 2). Aiso. il i'- preierable 
thai, she qnaistizsrs actually increabC from foreground to transition and from transition to backgroniid, 
so that the quality in the regions of interest can be optimized compared to the quality in the other 
regions. Note that transition regions frequently contain occlusions and artifacts surrounding the region 
of interest (like a talking head, for example) and using a lower quantizer here (as compared to the rest 

30 of the least-important region) can be expecsed to isnprove the overall visual quality of the video/image. 

After selecting initial quantization levels for the various regions, the image is encoded using 
those quantizers (step 210). The number of bits used to encode tije image is then compared with the bit 
tju-get (step 212). If the number of bits used is suftlciently close to the bit target (e.g., within a 
specified tolerance), then processing is terminated. Otherwise, if She number of bits used is either too 

35 much smaller or t«30 much greater than the bit target, then one or more of the quantizers are 
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appropriatsSy adjusted {step 214) and processing returns to step 210 to re-encode the image using the 
adjusted quantizers. Steps 210-214 are repeated iteratively until the bit target is sufficiently satisfied. 

If the number of bits used is too small relative to the bit target, then the quantization level 
selected for the region of interest (QP2 in Fig. 1) is preferably first decreased. Depending on the 
.5 existing differences between the quantizers for the different regions and the constraints applied by the 
overall video compression algorithm related to the magnitude of quantization-level changes between 
horizontally adjacent niacroblocks, it may also be necessary to decrease the quantizer assigned to the 
transition region (QPl ), which may in turn make it necessary to decrease the quantizer assigned to the 
background region (QPO). For example, assume that initially QP2=10, QPl =12, and QP0=^13, and that 
10 the maximum allowable quantizer change is 2. Assume further that the number of bits used to encode 
the image based on these quantizers is too low. In order to optimize video quality for the assigned bit 
target, it is desirable to decrease QP2 to 9. This change results in the need to decrease QPl to 11 to 
avoid violating the maximum allowable horizontal quantizer change between macroblocks of 2. In this 
situation, QPO will not have to be changed. However, if the number of bits is still too low, all three 
15 quantizers will have to be decremented on the next iteration. 

Similarly, if the number of bits is loo large relative to the bit target, then the quantization level 
selected for the background region (QPO) is preferably first increased. Here, too, depending on the 
situation, this increase in QPO may result in the need to increase the quantizer assigned to the transition 
region (QPl), which may in turn make it necessary to increase the quantizer assigned to the foreground 
20 region of interest (QP2). 

In generjil when too few hits are tised, i! is desirable to add bits first to the foreground region 
of interest, and, when too many bits are used, it is desirable to remove bits first from the backgrouiid 
least-important region. In low activity scenes, the quantize! selection algorithm of the present 
invention may be unable to match sufficieruiy the frame's bit target. In many ca.ses, especially in low- 
25 activity situations, the fj-ame rate m£iy not be ver>- significant. As sadi, vanaiions can be allowed in the 
frame-level bit expenditure, and/or cnnforniance lo chantjel requirements czn be achieved by varying 
the instantaneous frame rate. 

The present invention has been described in tlie cotitexi of a nmili-pasa encoding strategy that 
assigijs different quasitizer step sizes to differetjt regions of at! image while meeting a fratne-level bit 
30 target and ensuring spatial and temporal sniootliness in frame quality. This results in improved visual 
quality. However, because the scheme is computationally intensive, it may not be able to be used in 
real-time applications. 

The invemioK can also be impleinetJted as a real-time "pseudo-multi-pass" schertje based on 
modeling the rate-distortion curves at different quantization parameters. .According to this scheme, the 
35 number of bits required to encode a macroblock is modeled according to the following equation: 
R_q = (X^q * S (1 -t- Q/Q_d)) / Q 
-6- 
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where: 

R„q is the number of bits required to code a macroblock using quantization parameter Q; 
X_q is the model constant at Q; 
S is the distortion of the macroblock; and 
5 Q_d is the model coefficient in exponent of S. 

The big skip between an I frame and the following P frame is used to initialize the model. A P frame is 
used in this interval (but not coded ) to calculate initial model parameters by encoding all macroblocks 
at ail possible values of Q. This model is constantly updated as the sequence is coded. As such, the 
model adapts very well to scene content. 
10 When encoding of a frame is begun, the frame level rats coniro! provides a frame-ievel bis 

target. Based on the above rate -distortion modcL the quantization par atr.eteis are selected for the 
different regions. The sirsportant regioii is givesi a quantizer of QP-2. the tnmsiJion region is given QP, 
and the background is given a quantizer of (QP4-2). TIjis ensurss near-iransmittability of the 
quantization parameters (DQUANTs). The value of QP that corae', c;kiKe.;; to -.he frame-level bit target 
15 is chosen. 

The present invention provides the tv,'in advantages ffatne-level rate control and the ability 
to adapt the quantizer to reflect the importatsce of the regioss, while mairsf.aining spatial and temporal 
smoothness of the quantizer. As such, the preseru inveatson enables a video compression algotithm to 
meet a frame-level bit target, while ettsuring spatial and temporal smoothness in frame quality, thus 
20 resulting in improved visual perception duniig playh.nck. 

Although the invention has been described ir, the context of the talking head paradigm of 
videophone arid video-conferencntg applications, the invention is abo applicable for different kinds of 
schemes, preferably wliere the different rsgions ate fairly contiguous. 

Similarly, although the present invention has been describetJ in the context of enibodiments in 
25 which quantization level cotresponds to a specified quantizer pcirameter that ss used to quantize each 
transform coeificient, the present invention can also be implement in alternative embodiments, such as 
those in which quantization ievei corresponds to a quantizatiot! tabis in which each transfonn 
coefficient in a block of coefficients is assigned its own, possibly different quantizer value. 

The presetit invention can be embodied in the form of methods and apparatuses for practicing 
30 those methods. The present invention can also be embodied in the form of prograsn code embodied in 
tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable 
storage medium, wherein, when the program code is loaded into and executed by a machine, such as a 
computer, the machine becomes an apparatus for practicing the invention. The present invention can 
also be embodied in the form of program code, for example, whether stored in a storage medium, 
35 loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over 
electrical wiring or cabling, through fiber optics, or via elex;tromagnetic radiation, wherein, when the 
-7- 
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program code is loaded into and executed by a machine, such as a computer, the machine becomes an 
apparatus for practicing the invention. When implemented on a general-purpose processor, the 
program code segments combine with the processor to provide a unique device that operates 
analogously to speciric logic circuits. 

It will be further understood that various changes in the details, materials, and arrangements of 
the parts which have been described and illustrated in order to explain the nature of this invention may 
be made by those skilled in the art without departing from tfie principle and scope of the invention as 
expressed in tlie following claims. 
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CLAIMS 

What is claimed is: 

1 1 . A method for processing image data, comprising the steps of: 

2 (a) identifying one or more sets of image data corresponding to a region of interest in an 

3 image; 

4 (b) identifying one or more sets of image data corresponding to a transition region in the image 

5 located between the region of interest and a least-important region in the image; 

6 (c) selecting a first quantization level for each set of image data in the region of interest; 

7 (d) selecting a second quantization level for each set of image data in the transition region; 

g (e) selecting a third quantization level for each set of image data in the least-important region; 

9 and 

JO (f) encoding the image based on the selected first, second, and third quaruizatioii levels. 

1 2. The invention of claim 1 , wherein the first quantization level is lower than the second 

2 quantization level and the second quantization level is lower than the third quantization level. 

1 3. The invention of claim 1 , further comprising the steps of: 

2 (g) comparing the numbei- of bits, u^ed to encode the image m b:e.p ij > to a bit tajget for the 

3 image; 

4 (h) adjusting one or moie ot tlse lirst. second, and third qaanuzafion levels sn accordance with 

5 the comparison of step (g); and 

6 (i) re-encoding the image based on the adjusted quantszatiO!-! ieve)*.. 

1 4. The invention of claim 3, wiiefem s-'eps ^g;-u i aie lepeaied until tne number ot Dits used to 

2 encode the image is sufficiently close to the bit target. 

1 ^. ] ne liivenaon ot cjaim 3, wnereui: 

2 ■t the number of bits in step !gt is surficieniiy low, tnen step (h) comprises the step of 

3 ■.xtCiX^n^, 1. 1 1 t L,i.j'!.iAJiiot k . dtid if appropriate decreasmg the seconc quantization level, and 

4 then !t yj5prc*!>ria!f^- decreassng tlie thud quiirilizatiors level; and 

5 if ih<> FRiTTibet ot bits m step (g) is sufticientiv hnxh. then st(;p {h) comprises the step of 

6 it'creasiiK' the thtrd quantization level and. it appropriate, mcreasmg the second quaniszation level, and 

7 then, if appropriate, increasing the first quantization level. 

1 6. The invention of claim 1 , wherein the image has two or more regions of interest and each 

2 region of interest is assigned its own quantization level, which may differ between regions of interest. 

-9. 
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7. The invention of claim ! , wherein magnitudes of differences between the first and second 
quantization levels and between the second and third quantization levels are within a specified liirat 



1 8. The invention of claim 1 , wherein the region of interest corresponds to a talking head and the 

2 least-important region corresponds to a relatively stationary background. 

1 9. The invention of claim 1 , wherein at least one of the first, second, and third quantization levels 

2 is selected based on modeling of rate-distortion curves at different quantization levels. 

1 1 0. The invention of claim 9, wherein a number of bits used to encode a macroblock is modeled as 

2 follows: 

3 R_q = (X„q*S'^(l +Q/Q..d))/Q 

4 where: 

5 R_q is a number of bus used K; code a macroblock using quantization parameter Q; 

6 X ..q is a mociel constant at Q; 

7 S is a distortion of the macroblock; and 

8 Q„d is a model coefficient in exponent of S. 



-10- 
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